Proceedings Template - WORD

نویسندگان

  • Jing Bai
  • Jian-Yun Nie
چکیده

This paper describes an approach to text classification using language models. This approach is a natural extension of the traditional Naïve Bayes classifier, in which we replace the Laplace smoothing by some more sophisticated smoothing methods. In this paper, we tested four smoothing methods commonly used in information retrieval. Our experimental results show that using a language model, we are able to obtain better performance than traditional Naïve Bayes classifier. In addition, we also introduce into the existing smoothing methods an additional factor of smoothing scale according to the amount of training data of the class, and this allows us to further improve the classification performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proceedings Template - WORD

Path loss and delay profile models for ITS applications based on the measured data at 700MHz band are presented.

متن کامل

Proceedings Template - WORD

Preserving privacy while publishing social network data has become a serious issue with the rapid growth of Social Networks. In this work, we propose a perturbation based approach for privacy preserving publication of social network graphs and evaluate the utility aspect of our proposed method using real world dataset.

متن کامل

Proceedings Template - WORD

This poster presents a computational analysis of conceptual metaphors in a community of political blogs. Like sentiment analysis or opinion extraction, computational metaphor identification can provide a means of understanding the particular framings or conceptualizations used in a community. This poster includes an overview of the implementation and a summary of results.

متن کامل

POC-NLW Template for Chinese Word Segmentation

In this paper, a language tagging template named POC-NLW (position of a character within an n-length word) is presented. Based on this template, a twostage statistical model for Chinese word segmentation is constructed. In this method, the basic word segmentation is based on n-gram language model, and a Hidden Markov tagger based on the POC-NLW template is used to implement the out-of-vocabular...

متن کامل

CASRA+: A Colloquial Arabic Speech Recognition Application

The research proposed here was for an Arabic speech recognition application, concentrating on the Lebanese dialect. The system starts by sampling the speech, which was the process of transforming the sound from analog to digital and then extracts the features by using the Mel-Frequency Cepstral Coefficients (MFCC). The extracted features are then compared with the system's stored model; in this...

متن کامل

Proceedings Template - WORD

Most of the approaches for dealing with uncertainty in the Semantic Web rely on the principle that this uncertainty is already asserted. In this paper, we propose a new approach to learn and reason about uncertainty in the Semantic Web. Using instance data, we learn the uncertainty of an OWL ontology, and use that information to perform probabilistic reasoning on it. For this purpose, we use Ma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004